map representation
Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM
Sun, Zhicong, Lo, Jacqueline, Hu, Jinxing
Simultaneous localization and mapping (SLAM) technology has recently achieved photorealistic mapping capabilities thanks to the real-time, high-fidelity rendering enabled by 3D Gaussian Splatting (3DGS). However, due to the static representation of scenes, current 3DGS-based SLAM encounters issues with pose drift and failure to reconstruct accurate maps in dynamic environments. To address this problem, we present D4DGS-SLAM, the first SLAM method based on 4DGS map representation for dynamic environments. By incorporating the temporal dimension into scene representation, D4DGS-SLAM enables high-quality reconstruction of dynamic scenes. Utilizing the dynamics-aware InfoModule, we can obtain the dynamics, visibility, and reliability of scene points, and filter out unstable dynamic points for tracking accordingly. When optimizing Gaussian points, we apply different isotropic regularization terms to Gaussians with varying dynamic characteristics. Experimental results on real-world dynamic scene datasets demonstrate that our method outperforms state-of-the-art approaches in both camera pose tracking and map quality.
VG-Mapping: Variation-Aware 3D Gaussians for Online Semi-static Scene Mapping
He, Yicheng, Yu, Jingwen, Chen, Guangcheng, Zhang, Hong
We propose VG-Mapping, an RGB-D online 3DGS mapping system tailored to semi-static scenes. To address this issue, (b) VG-Mapping introduces a variation-aware mapping mechanism that (c) efficiently and accurately updates the changed areas. Abstract--Maintaining an up-to-date map that accurately reflects recent changes in the environment is crucial, especially for robots that repeatedly traverse the same space. Failing to promptly update the changed regions can degrade map quality, resulting in poor localization, inefficient operations, and even lost robots. In this paper, we propose VG-Mapping, a novel online 3DGS-based mapping system tailored for such semi-static scenes. Our approach introduces a hybrid representation that augments 3DGS with a TSDF-based voxel map to efficiently identify changed regions in a scene, along with a variation-aware density control strategy that inserts or deletes Gaussian primitives in regions undergoing change. Furthermore, to address the absence of public benchmarks for this task, we construct a RGB-D dataset comprising both synthetic and real-world semi-static environments. Experimental results demonstrate that our method substantially improves the rendering quality and map update efficiency in semi-static scenes. IMUL T ANEOUS Localization and Mapping (SLAM) systems are widely applied in robotics, AR/VR.
GPL-SLAM: A Laser SLAM Framework with Gaussian Process Based Extended Landmarks
Balcฤฑ, Ali Emre, Keyvan, Erhan Ege, รzkan, Emre
We present a novel Simultaneous Localization and Mapping (SLAM) method that employs Gaussian Process (GP) based landmark (object) representations. Instead of conventional grid maps or point cloud registration, we model the environment on a per object basis using GP based contour representations. These contours are updated online through a recursive scheme, enabling efficient memory usage. The SLAM problem is formulated within a fully Bayesian framework, allowing joint inference over the robot pose and object based map. This representation provides semantic information such as the number of objects and their areas, while also supporting probabilistic measurement to object associations. Furthermore, the GP based contours yield confidence bounds on object shapes, offering valuable information for downstream tasks like safe navigation and exploration. We validate our method on synthetic and real world experiments, and show that it delivers accurate localization and mapping performance across diverse structured environments.
Globally Consistent RGB-D SLAM with 2D Gaussian Splatting
Zhong, Xingguang, Pan, Yue, Jin, Liren, Popoviฤ, Marija, Behley, Jens, Stachniss, Cyrill
Recently, 3D Gaussian splatting-based RGB-D SLAM displays remarkable performance of high-fidelity 3D reconstruction. However, the lack of depth rendering consistency and efficient loop closure limits the quality of its geometric reconstructions and its ability to perform globally consistent mapping online. In this paper, we present 2DGS-SLAM, an RGB-D SLAM system using 2D Gaussian splatting as the map representation. By leveraging the depth-consistent rendering property of the 2D variant, we propose an accurate camera pose optimization method and achieve geometrically accurate 3D reconstruction. In addition, we implement efficient loop detection and camera relocalization by leveraging MASt3R, a 3D foundation model, and achieve efficient map updates by maintaining a local active map. Experiments show that our 2DGS-SLAM approach achieves superior tracking accuracy, higher surface reconstruction quality, and more consistent global map reconstruction compared to existing rendering-based SLAM methods, while maintaining high-fidelity image rendering and improved computational efficiency.
FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment
Laina, Sebastiรกn Barbas, Boche, Simon, Papatheodorou, Sotiris, Schaefer, Simon, Jung, Jaehyung, Leutenegger, Stefan
Geometrically accurate and semantically expressive map representations have proven invaluable to facilitate robust and safe mobile robot navigation and task planning. Nevertheless, real-time, open-vocabulary semantic understanding of large-scale unknown environments is still an open problem. In this paper we present FindAnything, an open-world mapping and exploration framework that incorporates vision-language information into dense volumetric submaps. Thanks to the use of vision-language features, FindAnything bridges the gap between pure geometric and open-vocabulary semantic information for a higher level of understanding while allowing to explore any environment without the help of any external source of ground-truth pose information. We represent the environment as a series of volumetric occupancy submaps, resulting in a robust and accurate map representation that deforms upon pose updates when the underlying SLAM system corrects its drift, allowing for a locally consistent representation between submaps. Pixel-wise vision-language features are aggregated from efficient SAM (eSAM)-generated segments, which are in turn integrated into object-centric volumetric submaps, providing a mapping from open-vocabulary queries to 3D geometry that is scalable also in terms of memory usage. The open-vocabulary map representation of FindAnything achieves state-of-the-art semantic accuracy in closed-set evaluations on the Replica dataset. This level of scene understanding allows a robot to explore environments based on objects or areas of interest selected via natural language queries. Our system is the first of its kind to be deployed on resource-constrained devices, such as MAVs, leveraging vision-language information for real-world robotic tasks.
Range-based 6-DoF Monte Carlo SLAM with Gradient-guided Particle Filter on GPU
Nakao, Takumi, Koide, Kenji, Takanose, Aoki, Oishi, Shuji, Yokozuka, Masashi, Date, Hisashi
-- This paper presents range-based 6-DoF Monte Carlo SLAM with a gradient-guided particle update strategy. While non-parametric state estimation methods, such as particle filters, are robust in situations with high ambiguity, they are known to be unsuitable for high-dimensional problems due to the curse of dimensionality. T o address this issue, we propose a particle update strategy that improves the sampling efficiency by using the gradient information of the likelihood function to guide particles toward its mode. Additionally, we introduce a keyframe-based map representation that represents the global map as a set of past frames (i.e., keyframes) to mitigate memory consumption. The keyframe poses for each particle are corrected using a simple loop closure method to maintain trajectory consistency. The combination of gradient information and keyframe-based map representation significantly enhances sampling efficiency and reduces memory usage compared to traditional RBPF approaches. T o process a large number of particles (e.g., 100,000 particles) in real-time, the proposed framework is designed to fully exploit GPU parallel processing. Experimental results demonstrate that the proposed method exhibits extreme robustness to state ambiguity and can even deal with kidnapping situations, such as when the sensor moves to different floors via an elevator, with minimal heuristics.